Today's Question:  What does your personal desk look like?        GIVE A SHOUT

SEARCH KEYWORD -- HEURISTIC CACHE



  Rediscovering the RSync Algorithm

A:Ok, you’re synchronizing this over the web; and what do you use for the synchronization? B: Oh, we implemented the rsync algorithm. A: uhu. And what do you do with really big files? B: The same. A: And you also synchronise folders? B: Yes. A: And how do you do that? B: we iterate over the folder, using the algorithm on every file, recursing over subfolders. A: Can you try 2 things for me? First, a very large file; and second, a large codebase, and see if it holds. Introduction First ...

   ReSync algorithm,Discovery     2012-02-14 10:47:24

  A Python Optimization Anecdote

Hi! I’m Pavel and I interned at Dropbox over the past summer. One of my biggest projects during this internship was optimizing Python for dynamic page generation on the website. By the end of the summer, I optimized many of dropbox.com’s pages to render 5 times faster. This came with a fair share of challenges though, which I’d like to write about today:The ProblemDropbox is a large website with lots of dynamically generated pages. The more pages that are dynamically generat...

   Python,Anecodate,Optimization,Efficiency     2011-10-25 10:33:20

  Java Sequential IO Performance

Many applications record a series of events to file-based storage for later use.  This can be anything from logging and auditing, through to keeping a transaction redo log in an event sourced design or its close relative CQRS.  Java has a number of means by which a file can be sequentially written to, or read back again.  This article explores some of these mechanisms to understand their performance characteristics.  For the scope of this article I will be using pre-a...

   Java,IO,Sequential,Blocking     2012-02-23 07:09:10

  Some Thoughts on Twitter's Availability Problems

As a regular user of Twitter I've felt the waves of frustration wash over me these past couple of weeks as the service has been hit by one outage after another. This led me to start pondering the problem space [especially as it relates to what I'm currently working on at work] and deduce that the service must have some serious architectural flaws which have nothing to do with the reason usually thrown about by non-technical pundits (i.e. Ruby on Rails is to blame). Some of my suspicions ...

   Twitter,Architecture,Availability,Design     2011-08-12 07:39:21

  Three Simple Ways to Improve the Security of Your Web App

It seems like web app security has entered the public conscious recently, probably as a result of the press covering the activities of groups like Anonymous and incidents like security breaches at several CAs. Here are a couple of quick security tips to improve the security of your web apps. Think of these as low-hanging fruit, not as a substitute for thorough analysis of your app’s security. If there’s interest in this topic we can do more posts, too - let us know in the com...

   Web app,Security,X-FRAME-OPTIONS,SSL     2011-12-08 10:10:20

  Build your own internet search engine - Part 2

After having started to build my own internet search engine as described in a previous blog post, I now have read some papers and books about web search engine architecture and information retrieval to complete my hobby project. Here is a list of papers and books that I highly recommend to anybody who is interested in this topic: 1. Google: data structures and algorithms by Petteri Huuhka 2. The Anatomy of a Large-Scale Hypertextual Web Search Engine by the Google founde...

   Search engine,Paper,Database,Data structure     2011-12-22 08:25:59

  A Solution to CPU-intensive Tasks in IO Loops

Back in October 2011, Ted Dziuba infamously said that Node.js is Cancer.  A provocative title to a provocative article.  The only thing it didn’t really provoke in the commentary was much thought ;)  Zing. My interpretation of the article is that Ted holds up the classic blocking-IO process-per-request (or  thread per request; same difference) model as superior.  Yet we all remember where the blocking-IO forking model got Apache in the early days.  ...

   CPU,Intensive IO loops,Solution,C++     2012-02-06 07:42:40

  Popular Golang JSON libraries evaluation

JSON (Javascript Object Notation), a prevailing data exchange format, is widely used in various platforms and languages. Golang, of course, will never miss the support for JSON. And with its own standard library, such as those interfaces like the REST API from the API Service in Kubernetes, it can easily process JSON. Although Go’s library works great, we can still seek those open-source JSON libs in Github to maximize our efficiency. Then the features, performance, applicability of these ...

   FASTJSON,JSON LIB,JSON LIB COMPARISON,GO-JSON     2021-12-11 23:13:23

  Programming Language Readability

Lets compare some Python to Haskell for solving the same problem.  The problem we’ll pick is Trie data-structure for auto-completions.  We are interested not so much in the nitty gritty of the algorithm, but in the language style itself.  Auto-complete has been in the programming news a lot recently; both a Python and a Haskell solver have turned up. (I suspect this post got flagged on Hacker News :(  It never got on the front-page despite the rapid upvoting on a n...

   Programming,Readability,Python,Haskell     2012-02-27 04:52:02

  Decision Trees in C#

Decision trees are simple predictive models which map input attributes to a target value using simple conditional rules. Trees are commonly used in problems whose solutions must be readily understandable or explainable by humans, such as in computer-aided diagnostics and credit analysis. Download source code Download sample applications Download the full Accord.NET Framework Introduction Decision Trees give a direct and intuitive way for obtaining the classification of a new instance f...

   C#,Decision tree     2012-03-23 10:00:56